PostgresoL Blog

The transactional workload synchronous_commit can be tuned by disabling the parameter. this measure has amazing results. But the possibility of losing committed transactions during an operating system...

commit_delay for more efficiency: a woman tells her friend on the phone that they can have a double wedding now that the friend got a proposal

The transactional workload synchronous_commit can be tuned by disabling the parameter. this measure has amazing results. But the possibility of losing committed transactions during an operating system crash makes it a factor in the failure of many applications to start. So I decided to write about it.

WAL flushes as a bottleneck for transactional database workloads

To make sure that committed transactions cannot get lost, PostgreSQL has to make sure that the WAL for the transaction is flushed to disk before it can report success to the client. If the database workload is dominated by small data modifications, the IOPS generated by these transactions can saturate the disk, even though the amount of data written is moderate.

The parameter pair commit_delay and commit_siblings can relax the bottleneck by reducing the number of IOPS necessary for those WAL flushes.

How do commit_delay and commit_siblings work?

You activate the feature by setting commit_delay to a value greater than zero. Whenever a transaction reaches the point where it would flush the WAL to disk during a commit, it first examines how many other transactions are currently active. If there are at least commit_siblings other transactions open and not waiting for a lock, PostgreSQL doesn't flush the WAL immediately, but waits for commit_delay microseconds. After that delay, some other transactions may have reached the point when they are ready to flush the WAL. All these backends can then perform their WAL flush in a single I/O operation.

commit_delay is not easy to tune, because the delay will make the transaction take longer. On the other hand, if you choose a value that is too low, no other transaction might be ready by the time the delay has passed, and you cannot reduce the number of IOPS performed.

Setup for the commit_delay benchmark

The benchmark was run on my ASUS ZenBook UX433F notebook with local NVME disk, 8 CPU cores and 16GB RAM. I set shared_buffers = 3GBmax_wal_size = 100GB and checkpoint_timeout = 15min. Then I initialized the standard pgbench database with a scale factor of 100. I used pg_prewarm to load all the pgbench tables and indexes into shared buffers. That way, there should be no reading I/O ever, and, apart from checkpoints, the only I/O would be WAL writes.

The pgbench command I used was

pgbench -b simple-update -c 10 -T 1200

Throttling the disk

The built-in NVME in my laptop is so powerful that I couldn't saturate it with pgbench. Therefore, I decided to use Linux control groups to throttle the device to 1000 IOPS. On my Fedora 40 system, I had to enable I/O control for the systemd slices:

echo '+memory +pids +io' > /sys/fs/cgroup/system.slice/cgroup.subtree_control

Then, I could set the IOPS limit on the NVME for writing for the PostgreSQL v17 service:

echo '259:0 wiops=1000' > /sys/fs/cgroup/system.slice/postgresql-17.service/io.max

You could argue that that makes my test artificial. However, people who host their databases in a public cloud are constrained by limits just like this one. And then, you can never directly apply the results of a benchmark to a different system and workload anyway.

Results of the commit_delay benchmark

Benchmark results
commit_delaytransactions per secondIOPS
0 μs15761000
10 μs17031000
30 μs17151000
50 μs17781000
100 μs18371000
200 μs19331000
500 μs21831000
750 μs2583900
1000 μs2738600
1250 μs2508510
1500 μs2397480
2000 μs2051430

We achieved the best performance with a commit_delay of 1000 μs. With that setting, pgbench performed somewhat less than twice as many transactions per second than without commit_delay. It is interesting to note that at the optimum, the disk is far from saturated, so it might be possible to achieve even better results.

Conclusion

While commit_delay doesn't boost the performance of a transactional workload in the same way that synchronous_commit = off can, we were still able to achieve a substantial performance improvement. If you cannot afford to lose transactions after an operating system crash, tuning commit_delay is the best you can do to speed up a workload consisting of short transactions.







Our Customers

Industries